A tractable online learning algorithm for the multinomial logit contextual bandit

نویسندگان

چکیده

In this paper, we consider the contextual variant of MNL-Bandit problem. More specifically, a dynamic set optimization problem, where decision-maker offers subset (assortment) products to consumer and observes response in every round. Consumers purchase maximize their utility. We assume that attributes describe products, mean utility product is linear values these attributes. model choice behavior using widely used Multinomial Logit (MNL) decision makers problem dynamically learning parameters while optimizing cumulative revenue over selling horizon T. Though has recently attracted considerable attention, many existing methods often involve solving an intractable non-convex Their theoretical performance guarantees depend on problem-dependent parameter which could be prohibitively large. particular, current algorithms for have regret bounded by O(κdT), κ constant may exponential dependency number attributes, d. propose optimistic algorithm show O(dT+κ), significantly improving methods. Further, convex relaxation step, allows tractable decision-making retaining favorable guarantee. also demonstrate our robust varying through numerical experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An MM Algorithm for General Mixed Multinomial Logit Models∗

This paper develops a new technique for estimating mixed logit models with a simple minorization-maximization (MM) algorithm. The algorithm requires minimal coding and is easy to implement for a variety of mixed logit models. Most importantly, the algorithm has a very low cost per iteration relative to current methods, producing substantial computational savings. In addition, the method is asym...

متن کامل

The Generalized Multinomial Logit Model

The so-called “mixed” or “heterogeneous” multinomial logit (MIXL) model has become popular in a number of fields, especially Marketing, Health Economics and Industrial Organization. In most applications of the model, the vector of consumer utility weights on product attributes is assumed to have a multivariate normal (MVN) distribution in the population. Thus, some consumers care more about som...

متن کامل

A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System

Most existing approaches in Mobile Context-Aware Recommender Systems focus on recommending relevant items to users taking into account contextual information, such as time, location, or social aspects. However, none of them has considered the problem of user’s content evolution. We introduce in this paper an algorithm that tackles this dynamicity. It is based on dynamic exploration/exploitation...

متن کامل

A Contextual Bandit Approach for Stream-Based Active Learning

Contextual bandit algorithms – a class of multiarmed bandit algorithms that exploit the contextual information – have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption adopted in the literature is that the realized (ground truth) reward by taking the selected action is observed by the learner at no cost, which, however, is not reali...

متن کامل

Contextual Bandit Algorithms with Supervised Learning Guarantees

We address the problem of competing with any large set of N policies in the nonstochastic bandit setting, where the learner must repeatedly select among K actions but observes only the reward of the chosen action. We present a modification of the Exp4 algorithm of Auer et al. [2], called Exp4.P, which with high probability incurs regret at most O( √ KT lnN). Such a bound does not hold for Exp4 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: European Journal of Operational Research

سال: 2023

ISSN: ['1872-6860', '0377-2217']

DOI: https://doi.org/10.1016/j.ejor.2023.02.036